Principal component regression

In statistics, principal component regression (PCR) is a regression analysis that uses principal component analysis when estimating regression coefficients. It is a procedure used to overcome problems which arise when the exploratory variables are close to being colinear.[1]

In PCR instead of regressing the dependent variable on the independent variables directly, the principal components of the independent variables are used. One typically only uses a subset of the principal components in the regression, making a kind of regularized estimation.

Often the principal components with the highest variance are selected. However, the low-variance principal components may also be important, — in some cases even more important.[2]

The principle

PCR (principal components regression) is a regression method that can be divided into three steps:

  1. The first step is to run a principal components analysis on the table of the explanatory variables,
  2. The second step is to run an ordinary least squares regression (linear regression) on the selected components: the factors that are most correlated with the dependent variable will be selected
  3. Finally the parameters of the model are computed for the selected explanatory variables.

See also

References

  1. ^ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
  2. ^ Ian T. Jolliffe (1982). "A note on the Use of Principal Components in Regression". Journal of the Royal Statistical Society, Series C (Applied Statistics) 31 (3): 300–303. doi:10.2307/2348005. JSTOR 2348005.